首页> 外文OA文献 >Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks
【2h】

Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

机译:使用Deep进行街景图像的多位数识别   卷积神经网络

摘要

Recognizing arbitrary multi-character text in unconstrained naturalphotographs is a hard problem. In this paper, we address an equally hardsub-problem in this domain viz. recognizing arbitrary multi-digit numbers fromStreet View imagery. Traditional approaches to solve this problem typicallyseparate out the localization, segmentation, and recognition steps. In thispaper we propose a unified approach that integrates these three steps via theuse of a deep convolutional neural network that operates directly on the imagepixels. We employ the DistBelief implementation of deep neural networks inorder to train large, distributed neural networks on high quality images. Wefind that the performance of this approach increases with the depth of theconvolutional network, with the best performance occurring in the deepestarchitecture we trained, with eleven hidden layers. We evaluate this approachon the publicly available SVHN dataset and achieve over $96\%$ accuracy inrecognizing complete street numbers. We show that on a per-digit recognitiontask, we improve upon the state-of-the-art, achieving $97.84\%$ accuracy. Wealso evaluate this approach on an even more challenging dataset generated fromStreet View imagery containing several tens of millions of street numberannotations and achieve over $90\%$ accuracy. To further explore theapplicability of the proposed system to broader text recognition tasks, weapply it to synthetic distorted text from reCAPTCHA. reCAPTCHA is one of themost secure reverse turing tests that uses distorted text to distinguish humansfrom bots. We report a $99.8\%$ accuracy on the hardest category of reCAPTCHA.Our evaluations on both tasks indicate that at specific operating thresholds,the performance of the proposed system is comparable to, and in some casesexceeds, that of human operators.
机译:在无约束的自然照片中识别任意多字符文本是一个难题。在本文中,我们解决了这个领域中的一个同样困难的子问题。从街景图像中识别任意的多位数。解决此问题的传统方法通常将定位,分割和识别步骤分开。在本文中,我们提出了一种统一的方法,该方法通过使用直接在图像像素上运行的深度卷积神经网络来集成这三个步骤。我们采用深度神经网络的DistBelief实现,以便在高质量图像上训练大型分布式神经网络。我们发现,这种方法的性能随着卷积网络的深度而增加,在我们训练的最深架构中具有11个隐藏层的情况下表现最佳。我们在可公开获取的SVHN数据集上评估了该方法,并在识别完整街道编号时获得了超过$ 96 \%$的准确性。我们显示,在每位数的识别任务上,我们对最新技术进行了改进,达到了$ 97.84 \%$的准确性。我们还将在由街景图像生成的更具挑战性的数据集上评估该方法,该数据集包含数千万条街道编号注释,并达到超过$ 90 \%$的准确性。为了进一步探索所提出的系统对更广泛的文本识别任务的适用性,我们将其应用于reCAPTCHA的合成失真文本。 reCAPTCHA是最安全的反向测试之一,它使用变形的文本来区分人与机器人。我们报告的reCAPTCHA最困难类别的准确性为$ 99.8 \%$。我们对这两项任务的评估表明,在特定的操作阈值下,拟议系统的性能可与人工操作者媲美,甚至在某些情况下超过了操作员。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号